Managing and Aggregating Data Transfers in Data Centers

نویسندگان

  • Deke Guo
  • Mo Li
  • Hai Jin
  • Xuanhua Shi
  • Lu Lu
چکیده

Distributed computing applications like MapReduce transfer massive amount of data between their successive processing stages. These data transfers, such as common shuffle and incast communication patterns, contribute most of the network traffic and thus have severe impacts on application performances. Despite such impacts, there has been relatively little work on decreasing the amount of traffic for computing such data transfers. We observe that the massive data flows in such a transfer already apply aggregate functions at the receiver side and the reduction in size between the input data and the output data is even pronounced. This motivates us to bring opportunities for performing interflow data aggregation during the transmission phase as early as possible rather than just at the receiver side. To this end, we first demonstrate the gain and feasibility of inter-flow data aggregation for data transfers in data centers with novel network structures. To achieve such a gain, such data transfers are normalized as the incast transfer. It is modeled as an incast minimal tree problem that is proved to be NP-hard in representative BCube and FBFLY data centers. We propose two approximate methods, the RS-based and ARS-based incast tree building methods, to generate an efficient incast tree based on only the labels of all incast members and the data center topology. We further present incremental methods to tackle the dynamic and fault-tolerant issues of the incast tree. Using a prototype implementation and large-scale simulations, we demonstrate that our method can significantly decrease the amount of network traffic, save the data center resources, and reduce the delay of the entire process of a job. Moreover, our proposals for BCube and FBFLY can also be applied to other novel data centers after minimal modifications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analyzing Data Movements and Identifying Techniques for Next-generation High-bandwidth Networks

High-bandwidth networks are poised to provide new opportunities in tackling large data challenges in today's scientific applications. However, increasing the bandwidth is not sufficient by itself; we need careful evaluation of future high-bandwidth networks from the applications’ perspective. We have investigated data transfer requirements of climate applications as a typical scientific example...

متن کامل

Determinants of intergenerational transfers between elderly parents and adult children in the city of Tehran

Intergenerational private transfers as a component of intergenerational relations, defined as exchang of financial and nonfinancial rsources between different generations in the family. Financial transfers are known as supply of lifeycle deficit in the old and young ages and an important factor to fullfill needs in these stages of lifecycle. The aim of the study is to recognize composition of f...

متن کامل

بررسی خودکارآمدی مدیریت علایم مرتبط با شیمی درمانی در زنان مبتلا به سرطان پستان

Introduction: Self-efficacy emphasizes on skills and abilities to carry out successful worthy performance. The aim of this study was to evaluate the symptom-management’s self-efficacy associated with chemotherapy and its related factors in women referred to academic centers of Urmia in 2016. Methods: This study was a descriptive-analytical cross-sectional study conducted on 150 patient...

متن کامل

Managing uncertainty when aggregating from pixels to objects : context sensitive mapping and possibility theory

Object-oriented remote sensing software provides the user with flexibility in the way that remotely sensed data are classified through segmentation routines and userspecified fuzzy rules. This letter explores the classification and uncertainty issues associated with aggregating detailed ‘sub-objects’ to spatially coarser ‘super-objects’ in object-oriented classifications. We show Possibility Th...

متن کامل

Cloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming

The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012